-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPE-4115] Performance Profile Support #466
Conversation
Sync charm docs from https://discourse.charmhub.io Co-authored-by: a-velasco <[email protected]>
Currently, we are having a lot of time outs in CA rotation testing. Breaking between small and large deployments and having parallel runners will help with that overall duration.
… and its config changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @phvalguima.
I have a few points:
1. Replication factor:
We should never set the replication factor to 1-all
- this is unnatural for non system indices and may cause the disk of all units to overflow quickly.
Replication factors are set by the user.
If we want to be safer, we should look for indices whose number_of_replicas < 1
and set it to 1. Not more.
2. Codec used:
Why did you choose zstd_no_dict
instead of zst
or instead qat_lz4
(in supported systems) or others? which one to choose? How?
It may also be needed to set a compression algorithm with each? which one to choose? how?
3. Heap size:
The heap size should, in production, be set to 50% but lower than 32Gb as per the official opensearch recommendations. here and here.
4. Units conversion:
The main complexity of this PR revolves around unit conversions, 2 things to note:
/proc/meminfo
always returns units inKb
.
1. any reason to not usepsutil
? this should save you the parsing of/meminfo
- Jvm XMS and XMX properties accept either
g|G -- m|M -- k|K
, meaning the smallest unit isk
.
With this in mind - it makes sense to normalize the values to the smallest unit supported by both, which is the Kb
and work with it exclusively (when reading, calculating or writing to file).
Something along the lines of:
def jvm_size_in_kb(input: str) -> int:
"""Normalize the size values set in the jvm.options from supported units to Kb."""
match = re.match(r"(\d+)([gmk])$", input.lower())
value = float(match.group(1))
jvm_formatted_unit = match.group(2)
factor = 1
if jvm_formatted_unit == "m":
factor = 1024
elif jvm_formatted_unit == "g":
factor = 1024 * 1024
# the loss is minimal since we're working with KBs
return int(value * factor)
With this, I believe we do not need ByteUnit
and JavaByteSize
classes.
Similar to the the percentage method, which can simply be calculated as int(0.25 * val)
==> again, the loss of rounding to the floor is minimal because we are dealing with Kilobytes.
Hi @Mehdi-Bendriss, I will go over your points one by one.
Indeed, I corrected that right after our conversation earlier this week in this commit. Although I agree with the removal of
True, but on our specification: That is why I am trying to put together a minimum "index template", that at least assures an user that indices will be replicated, unless this user explicitly states otherwise.
TBH, I am not entirely set on each of the values we will discuss below. That is one of the reasons I have added them as component templates, so an user will build their own index templates on top. I'd rather benchmark these for comparison on top. Maybe getting that landed on this PR is too much. TL;DR I think we can play it safer as follows:
WDYT? I will start with a later question:
The codec selection came from these results.
In the case of
Happy to set QAT if we get the logic to detect and enable it. We are not yet there.
First, we should keep in mind some types of indices cannot simply work with all the codecs (e.g. the vector indices). As you also noticed, I am not really onboard with these results, as there are other parameters to be set. In this case, the How?? will have to be done the same way we've done the AVX testing: by running performance tests and comparing results.
Yes, I also have this concern. What I think we should do here is run these component templates against benchmarks and document their results. Thanks, that is a really good point. I will set a hard limit of up to 32G. Indeed having the GC going over 100s of Gigs does not sound good :)
So, some thoughts here: (1) indeed sticking with Kb, as long as the JVM can accept the "32G" in Kb format, is a good idea and would simplify a lot; and (2) I was discussing with Big Data team could benefit from this logic here. The idea is to eventually move to the
That is true as the kernel code shows.
Yes, two reasons: (1) the parsing goes from L589 to L598... I prefer 10 LoCs that process a file whose format is set in stone by the kernel than adding a new dependency (that actually is not shipped with stock Ubuntu anyways); and (2) it gives access to hugepages info as well, which we can potentially explore later, and would be "just there".
Yes, I was going down to the bytes and coming back but indeed makes sense to stop on Kb instead and we handle all this in Kb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Pedro!
if not self.peers_data.get(Scope.UNIT, PERFORMANCE_PROFILE): | ||
return None | ||
return OpenSearchPerfProfile.from_dict( | ||
{"typ": self.peers_data.get(Scope.UNIT, PERFORMANCE_PROFILE)} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for conciseness:
if not self.peers_data.get(Scope.UNIT, PERFORMANCE_PROFILE): | |
return None | |
return OpenSearchPerfProfile.from_dict( | |
{"typ": self.peers_data.get(Scope.UNIT, PERFORMANCE_PROFILE)} | |
) | |
if not (profile := self.peers_data.get(Scope.UNIT, PERFORMANCE_PROFILE)): | |
return None | |
return OpenSearchPerfProfile.from_dict({"typ": profile}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Only one minor thing that could potentially be changed, depending on your preference.
return | ||
|
||
perf_profile_needs_restart = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed, it will be overwritten with True
or False
in line 738.
This PR extends the current charm to support performance profiles following spec DA031 and add supports for the following profiles:
max(1G, 10% of RAM)
indices.memory.index_buffer_size
extends to25%
max(1G, 50% of RAM)
The options above are set based on the following documents:
https://opensearch.org/docs/latest/tuning-your-cluster/performance/
https://opensearch.org/docs/latest/search-plugins/knn/performance-tuning/
The user can switch between the three options above, and depending on the selected value, the templates are created or destroyed.
UPDATE
One important question about this PR is if index templates with '*' will apply to system indices. So, the first part of this answer is: index templates are only applied at index creation, as shown here. We can delete index templates after indices were created based on that template.
Manual configuration (e.g. setting
0-all
) will always take precedence.There is an exception for hidden (not necessarily system) indices: if we have templates that are "catch-all", then they are not applied to hidden indices.